feat(telemetry): roll up http requests#946
Conversation
|
/coder-agents-review |
There was a problem hiding this comment.
Good addition. The tumbling-window aggregation design is proportional to the problem, the TelemetryReporter interface extraction keeps coupling narrow, and the NOOP sentinel pattern for throwaway clients is clean. The test harness is well-structured with fake timers, meaningful assertions, and non-vacuous coverage.
Severity summary: 1 P2, 13 P3, 1 P4, 5 Nit, 2 Note.
The P2 is the UUID regex version nibble constraint ([1-5]), which rejects UUIDv6/v7/v8 in the fallback normalizer. Seven reviewers flagged it independently. The fix is one character.
Of the P3s, the most impactful cluster: dispose() drops accumulated data without flushing, the timer callback has no try/catch (a throw permanently kills the chain), and average()/percentile95() produce NaN/undefined on empty input. All are straightforward defensive fixes.
On the design side: the route normalization fallback passes name-bearing tail segments (like usernames after /members/) through unchanged, which partially contradicts the stated goal of stripping user/workspace names. And the interceptor registration order means telemetry records phantom errors for requests that are transparently retried by the cert-refresh or auth interceptors. Neither is urgent, but both are worth acknowledging.
One reviewer questioned whether the windowSeconds config knob earns its ~115 lines of schema, reader, watcher, update, and tests given nobody will change the default. Worth a human decision.
"A public surface with zero in-tree callers is a foot-gun, not dead code." (Kite, on the UUID fallback path)
🤖 This review was automatically generated with Coder Agents.
Drop the recorder interface, noop sentinels, and creation helper — construct HttpRequestsTelemetry unconditionally and let the noop reporter swallow .log() calls. Flatten route normalization rules, switch buckets to a nested map (no string key encoding), and stop pushing zero-duration samples when metadata is absent so p95 isn't skewed. Drop the LOCAL_SINK_SETTING backwards-compat alias.
cd6bd55 to
64f09d1
Compare
- UUID regex matches any version (v6/v7/v8 now normalize). Then dropped
UUID/numeric heuristics entirely; route normalization uses explicit
rules only. Added api/v2/workspaces/{id}/builds/{id} for the main
composite case. Trailing literal segments still pass through.
- Flush the pending bucket on dispose so the last partial window is
not lost.
- Drop the coder.telemetry.local.httpRequests.windowSeconds knob and
hardcode 60s, removing the schema, reader, watcher, update method,
and related tests.
- Rename addLoggingInterceptors -> addRequestInterceptors. Document
that cert-refresh retries re-enter the chain so each attempt is
recorded in telemetry.
- Add explicit rule for /api/v2/organizations/{id}/members/{name}.
- Use toSorted, drop the unused export, comment normalizeByRule.
- Drop "aggregated" from the CHANGELOG entry.
- Tests: dispose-flush, no-rule fallback, undefined / "" inputs.
|
/coder-agents-review |
There was a problem hiding this comment.
Strong response to R1. All 19 addressed findings verified: UUID regex eliminated entirely (class-of-bug fix), config surface removed, dispose flushes, try/finally protects the timer chain, empty-array guards return 0, naming and docs cleaned up. The normalization redesign (position-based rules, no UUID regex) is cleaner than what it replaced.
DEREM-20 (contested, interceptor ordering): panel closed 3/3 accept. The author's defense holds: each interceptor-level attempt is a real network event, cert-refresh frequency is bounded, and the added comment at coderApi.ts:520-522 documents the counting semantics.
Severity summary for new findings: 1 P2, 4 P3.
The P2 is a regression from the R1 refactor: removing the NOOP sentinel check means every CoderApi.create() now constructs a real HttpRequestsTelemetry with a running 60-second timer, including 5-6 throwaway call sites that never dispose. The fix is straightforward: lazy-start the timer on first record() instead of in the constructor.
"What happens when the extension runs for a week in a remote window?" (Hisoka, on the timer leak)
🤖 This review was automatically generated with Coder Agents.
- Drop UUID and numeric heuristics from route normalization. Use
explicit rules only; add api/v2/workspaces/{id}/builds/{id} for the
main composite path. Trailing literal segments still pass through.
- Add a rule for api/v2/organizations/{id}/members/{name}.
- Flush the pending bucket on dispose so the last partial window is
not lost. Wrap the flush in try/finally so a throw still clears
state.
- Track the start of each window and emit the actual elapsed seconds
in the http.requests event so consumers can compute a real rate.
- Skip the timer when constructed with NOOP_TELEMETRY_REPORTER so
throwaway CoderApi clients don't leak it.
- Drop the coder.telemetry.local.httpRequests.windowSeconds knob and
hardcode 60s, removing the schema, reader, watcher, update method,
and related tests.
- Rename addLoggingInterceptors to addRequestInterceptors. Note that
cert-refresh retries re-enter the chain, so each attempt is counted.
- toSorted, drop the unused export, document the public surface.
- Drop "aggregated" from the CHANGELOG entry.
- Tests: dispose-flush, NOOP timer-leak guard, no-rule fallback,
undefined / "" inputs.
4cc8b77 to
22fa9eb
Compare
Summary
http.requeststelemetry rollups for active(method, route)buckets. Every 60 seconds the aggregator emits one event per active bucket with status counts, network errors, avg duration, and p95.CoderApiclients. Throwaway clients passNOOP_TELEMETRY_REPORTERand never schedule a timer.Closes #907.
Validation
pnpm test:extension ./test/unit/settings/telemetry.test.ts ./test/unit/logging/httpRequestsTelemetry.test.ts ./test/unit/api/coderApi.test.tspnpm test:extensionpnpm format:checkpnpm lintpnpm typecheckImplementation
HttpRequestsTelemetryaggregates per(method, route)and emits the rollup every 60s. Tracks the actual window start so a partial flush ondispose()reports a usable elapsed value. Skips the timer entirely when constructed with the NOOP reporter so throwaway clients don't leak it.api/v2/workspaces/{id},api/v2/users/{name}/keys/{id}, etc.). No UUID or numeric parsing: if a new endpoint needs masking, add a rule.TelemetryReporterinterface andNOOP_TELEMETRY_REPORTERdecoupleCoderApifrom the concreteTelemetryService.Generated by Coder Agents.